[Python] How can I speed up unpickling large objects if I have plenty of RAM?

Posted by conradlee on Stack Overflow See other posts from Stack Overflow or by conradlee
Published on 2010-05-04T15:31:48Z Indexed on 2010/05/04 15:48 UTC
Read the original article Hit count: 281

It's taking me up to an hour to read a 1-gigabyte NetworkX graph data structure using cPickle (its 1-GB when stored on disk as a binary pickle file).

Note that the file quickly loads into memory. In other words, if I run:

import cPickle as pickle

f = open("bigNetworkXGraph.pickle","rb")
binary_data = f.read() # This part doesn't take long
graph = pickle.loads(binary_data) # This takes ages

How can I speed this last operation up?

Note that I have tried pickling the data both in using both binary protocols (1 and 2), and it doesn't seem to make much difference which protocol I use. Also note that although I am using the "loads" (meaning "load string") function above, it is loading binary data, not ascii-data.

I have 128gb of RAM on the system I'm using, so I'm hoping that somebody will tell me how to increase some read buffer buried in the pickle implementation.

© Stack Overflow or respective owner

Related posts about python

Related posts about unpickling